Learning from Delayed Rewards Using Influence Values Applied to Coordination in Multi-agent Systems

نویسندگان

Dennis Barrios-Aranibar

Luiz Marcos Garcia Gonçalves

چکیده

In this work we propose a new paradigm for learning coordination in multi-agent systems. This approach is based on social interaction of people, specially in the fact that people communicate to each other what they think about their actions and this opinion has some influence in the behavior of each other. We propose a model in which multi-agents learn to coordinate their actions giving opinions about the actions of other agents and also being influenced with opinions of other agents about their actions. We use the proposed paradigm to develop a modified version of the Q-learning algorithm. The new algorithm is tested and compared with independent learning (IL) and joint action learning (JAL) in a grid problem with two agents learning to coordinate. Our approach shows to have more probability to converge to an optimal equilibrium than IL and JAL Q-learning algorithms, specially when exploration increases. Also, a nice property of our algorithm is that it does not need to make an entire model of all joint actions like JAL algorithms. Keywords— Influence Value, Reinforcement Learning, Multi-agent coordination.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Voltage Coordination of FACTS Devices in Power Systems Using RL-Based Multi-Agent Systems

This paper describes how multi-agent system technology can be used as the underpinning platform for voltage control in power systems. In this study, some FACTS (flexible AC transmission systems) devices are properly designed to coordinate their decisions and actions in order to provide a coordinated secondary voltage control mechanism based on multi-agent theory. Each device here is modeled as ...

متن کامل

Solving delayed coordination problems in MAS

Recent research has demonstrated that considering local interactions among agents in specific parts of the state space, is a successful way of simplifying the multi-agent learning process. By taking into account other agents only when a conflict is possible, an agent can significantly reduce the state-action space in which it learns. Current approaches, however, consider only the immediate rewa...

متن کامل

Cooperative Multi-Agent Reinforcement Learning for Multi-Component Robotic Systems: guidelines for future research

Reinforcement Learning (RL) paradigm aims to develop algorithms that allow to train an agent to optimally achieve a goal with minimal feedback information about the desired behavior, which is not precisely specified. Scalar rewards are returned to the agent as response to its actions endorsing or opposing them. RL algorithms have been succesfully applied to robot control design. The extension o...

متن کامل

CLEAN rewards for improving multiagent coordination in the presence of exploration

In cooperative multiagent systems, coordinating the jointactions of agents is difficult. One of the fundamental difficulties in such multiagent systems is the slow learning process where an agent may not only need to learn how to behave in a complex environment, but may also need to account for the actions of the other learning agents. Here, the inability of agents to distinguish the true envir...

متن کامل

Reinforcement Learning in Large Multi-agent Systems

Enabling reinforcement learning to be effective in large-scale multi-agent Markov Decisions Problems is a challenging task. To address this problem we propose a multi-agent variant of Q-learning: “Q Updates with Immediate Counterfactual Rewards-learning” (QUICR-learning). Given a global reward function over all agents that the large-scale system is trying to maximize, QUICR-learning breaks down...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2007

Learning from Delayed Rewards Using Influence Values Applied to Coordination in Multi-agent Systems

نویسندگان

چکیده

منابع مشابه

Voltage Coordination of FACTS Devices in Power Systems Using RL-Based Multi-Agent Systems

Solving delayed coordination problems in MAS

Cooperative Multi-Agent Reinforcement Learning for Multi-Component Robotic Systems: guidelines for future research

CLEAN rewards for improving multiagent coordination in the presence of exploration

Reinforcement Learning in Large Multi-agent Systems

عنوان ژورنال:

اشتراک گذاری